If there is the data of Sunday in the data set, data analysis ( Multi-Variable Analysis and Graphical Analysis ) for weekday does not go well. In this case, data edit is needed.
For the small data, edit is easy. This page is the methods for the big data.
Make Data of Meta Knowledge might be needed before the method of this page.
Codes in this page start the phase that there is an input data named "df".
If start data is a csv file. Code to make "df" is bellow.
import pandas as pd # Read package
df= pd.read_csv("Data.csv")# Read data
Stratified Sampling
is the method to get the data we should analyze.
df[df.C1 == 'A1']
If we need to get the data "Between 3 and 4", code is below.
df[(df.X1 > 3) & (df.X1 <4)]
AND condision --> " & "
OR condision --> " | "
NOT condision --> " ~ "
Principal Component Analysis
is one of the method to decrease dimension.
But below is the easiest way.
df.X1
or
df.loc[:,['X1']]
df3 = pd.concat([df1, df2])
If there are unknown variable for each data set, the data is missing value ("NaN").
df3 = pd.concat([df1, df2],axis=1)
For example, the case below occurss.
If A1, A2 and A3 of each data set is needed on the same columns, merge (the medhod below) is better.
This example code merges right data (df2) to the left data (df1). In Excel, same method is VLOOKUP-function.
df3=pd.merge(df1,df2, how='left')
If we do not need "NaN" data, we use "inner" in stead of "left". If we need "NaN" data in left side data, we use "outer" in stead of "left".